Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
نویسندگان
چکیده
Selecting discriminative features in positive unlabelled (PU) learning tasks is a challenging problem due to lack of negative class information. Traditional supervised and semi-supervised feature selection methods are not able to be applied directly in this scenario, and unsupervised feature selection algorithms are designed to handle unlabelled data while neglecting the available information from positive class. To leverage the partially observed positive class information, we propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training instances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model. Extensive experiments on different benchmark databases and a real-world cyber security application demonstrate the effectiveness of our algorithm.
منابع مشابه
Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data
The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learning of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully t...
متن کاملPhishing website detection using weighted feature line embedding
The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...
متن کاملBASSUM: A Bayesian semi-supervised method for classification feature selection
Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world ap...
متن کاملFeature selection for semi-supervised data analysis in decisional information systems. (Sélection de variables pour l'analyse semi-supervisées des données dans les systèmes d'Information décisionnels)
Feature selection is an important task in data mining and machine learning processes. This task is well known in both supervised and unsupervised contexts. The semi-supervised feature selection is still under development and far from being mature. In general, machine learning has been well developed in order to deal with partially-labeled data. Thus, feature selection has obtained special impor...
متن کاملDimensionality Reduction of Hyperspectral Images by Combination of Non-parametric Weighted Feature Extraction (nwfe) and Modified Neighborhood Preserving Embedding (npe)
This paper combine two conventional feature extraction methods (NWFE&NPE) in a novel framework and present a new semisupervised feature extraction method called Adjusted Semi supervised Discriminant Analysis (ASEDA). The advantage of this method is dominating the Hughes phenomena, automatic selection of unlabelled pixels, extraction of more than L-1(L: number of classes) features and avoidance ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016